| model | pass1 | win_rate | elo | |
|---|---|---|---|---|
| 0 | gpt-4-0613+cot | 0.755 | 0.947 | 1540.237 |
| 1 | gpt-4-turbo-2024-04-09+cot | 0.757 | 0.865 | 1380.317 |
| 2 | gpt-3.5-turbo-0613+cot | 0.503 | 0.765 | 1259.673 |
| 3 | gpt-4-0613 | 0.698 | 0.717 | 1203.368 |
| 4 | claude-3-opus-20240229+cot | 0.734 | 0.685 | 1174.246 |
| 5 | gpt-4-turbo-2024-04-09 | 0.685 | 0.683 | 1174.064 |
| 6 | codellama-34b+cot | 0.501 | 0.675 | 1173.731 |
| 7 | codellama-13b+cot | 0.474 | 0.604 | 1115.937 |
| 8 | claude-3-opus-20240229 | 0.642 | 0.554 | 1068.591 |
| 9 | codellama-7b+cot | 0.404 | 0.542 | 1064.422 |
| 10 | codetulu-2-34b | 0.492 | 0.525 | 1049.250 |
| 11 | codellama-34b | 0.472 | 0.509 | 1036.052 |
| 12 | deepseek-base-33b | 0.465 | 0.489 | 1022.381 |
| 13 | deepseek-instruct-33b | 0.465 | 0.465 | 1002.395 |
| 14 | gpt-3.5-turbo-0613 | 0.490 | 0.461 | 1000.000 |
| 15 | codellama-python-34b | 0.439 | 0.457 | 998.455 |
| 16 | phind | 0.472 | 0.450 | 993.536 |
| 17 | codellama-13b | 0.425 | 0.442 | 989.942 |
| 18 | deepseek-base-6.7b | 0.419 | 0.438 | 985.885 |
| 19 | mixtral-8x7b | 0.393 | 0.411 | 965.541 |
| 20 | codellama-python-13b | 0.397 | 0.411 | 965.146 |
| 21 | magicoder-ds-7b | 0.417 | 0.379 | 939.551 |
| 22 | wizard-34b | 0.427 | 0.376 | 937.456 |
| 23 | codellama-python-7b | 0.373 | 0.341 | 912.804 |
| 24 | codellama-7b | 0.360 | 0.327 | 901.755 |
| 25 | mistral-7b | 0.350 | 0.317 | 894.167 |
| 26 | deepseek-instruct-6.7b | 0.374 | 0.297 | 877.088 |
| 27 | wizard-13b | 0.365 | 0.295 | 874.113 |
| 28 | phi-2 | 0.316 | 0.293 | 873.544 |
| 29 | starcoderbase-16b | 0.313 | 0.280 | 863.138 |